Unsupervised Named Entity Resolution

نویسندگان

Ioannis P. Klapaftis

Suresh Manandhar

چکیده

Resolving the ambiguity of person, organisation and location names is a challenging problem in the Natural Language Processing (NLP) area. This problem is usually formulated as a clustering problem, in which the target is to group mentions of the same entity into the same cluster. In this paper, we present a different approach based on the Distributional Hypothesis and edit distance, which associates an ambiguous entity to its corresponding entry in the Wikipedia knowledge base. We experiment with two types of contextual features, i.e. bag-of-words and bigrams, as well as with another source of information, i.e. the edit distance between an entity mention and a Wikipedia article’s title. Our experiments show that the combination of these types of knowledge offers a superior performance than each one individually or any subset of them, in effect leading to the conclusion that they are able to capture non-overlapping information that is essential for this task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

Structured Generative Models for Unsupervised Named-Entity Clustering

We describe a generative model for clustering named entities which also models named entity internal structure, clustering related words by role. The model is entirely unsupervised; it uses features from the named entity itself and its syntactic context, and coreference information from an unsupervised pronoun resolver. The model scores 86% on the MUC-7 named-entity dataset. To our knowledge, t...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

Incorporating Unsupervised Features into CRF based Named Entity Recognition

We participated in the extraction of complaint and diagnosis Task and the normalization of complaint and diagnosis Task of MedNLP2 in NTCIR11. In the extraction Task, we use CRF based Named Entity Recognition method. Moreover, we incorporate unsupervised features learned from raw corpus into CRF. We show such unsupervised features improve system performance.

متن کامل

Automatic Gazetteer Generation from Wikipedia

The presence of high quality Named Entity gazetteer within a CLIR system is crucial in order to provide multilingual access to digital resources, particularly in the domain of Digital Libraries. In our paper we investigate an approach for automatically extracting this kind of resources from Wikipedia using an unsupervised approach that leverages the DBpedia classification of the English article...

متن کامل

ANEAR: Automatic Named Entity Aliasing Resolution

Identifying the different aliases used by or for an entity is emerging as a significant problem in reliable Information Extraction systems, especially with the proliferation of social media and their ever growing impact on different aspects of modern life such as politics, finance, security, etc. In this paper, we address the novel problem of Named Entity Aliasing Resolution (NEAR). We attempt ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Unsupervised Named Entity Resolution

نویسندگان

چکیده

منابع مشابه

Corpus based coreference resolution for Farsi text

Structured Generative Models for Unsupervised Named-Entity Clustering

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Incorporating Unsupervised Features into CRF based Named Entity Recognition

Automatic Gazetteer Generation from Wikipedia

ANEAR: Automatic Named Entity Aliasing Resolution

عنوان ژورنال:

اشتراک گذاری